Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Name Extraction and Formal Concept Analysis

Identifieur interne : 000386 ( Main/Exploration ); précédent : 000385; suivant : 000387

Name Extraction and Formal Concept Analysis

Auteurs : Kazem Taghva [États-Unis] ; Russell Beckley [États-Unis] ; Jeffrey Coombs [États-Unis]

Source :

RBID : ISTEX:9FB8946D596A1C6C39D5A5684E2FF26F722CD51B

Abstract

Abstract: Many applications of Formal Concept Analysis (FCA) start with a set of structured data such as objects and their properties. In practice, most of the data which is readily available are in the form of unstructured or semistructured text. A typical application of FCA assumes the extraction of objects and their properties by some other methods or techniques. For example, in the 2003 Los Alamos National Lab (LANL) project on Advanced Knowledge Integration In Assessing Terrorist Threats, a data extraction tool was used to mine the text for the structured data. In this paper, we provide a detailed description of our approach to extraction of personal names for possible subsequent use inFCA. Our basic approach is to integrate statistics on names and other words into an adaptation of a Hidden Markov Model (HMM). We use lists of names and their relative frequencies compiled from U.S. Census data. We also use a list of non-name words along with their frequencies in a training set from our collection of documents. These lists are compiled into one master list to be used as a part of the design.

Url:
DOI: 10.1007/978-3-642-22688-5_28


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Name Extraction and Formal Concept Analysis</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</author>
<author>
<name sortKey="Beckley, Russell" sort="Beckley, Russell" uniqKey="Beckley R" first="Russell" last="Beckley">Russell Beckley</name>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9FB8946D596A1C6C39D5A5684E2FF26F722CD51B</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-22688-5_28</idno>
<idno type="url">https://api.istex.fr/document/9FB8946D596A1C6C39D5A5684E2FF26F722CD51B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000662</idno>
<idno type="wicri:Area/Istex/Curation">000654</idno>
<idno type="wicri:Area/Istex/Checkpoint">000042</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Taghva K:name:extraction:and</idno>
<idno type="wicri:Area/Main/Merge">000391</idno>
<idno type="wicri:Area/Main/Curation">000386</idno>
<idno type="wicri:Area/Main/Exploration">000386</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Name Extraction and Formal Concept Analysis</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="3">
<country>États-Unis</country>
<placeName>
<settlement type="city">Las Vegas</settlement>
<region type="state">Nevada</region>
</placeName>
<wicri:orgArea>School of Computer Science, University of Nevada</wicri:orgArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Beckley, Russell" sort="Beckley, Russell" uniqKey="Beckley R" first="Russell" last="Beckley">Russell Beckley</name>
<affiliation wicri:level="3">
<country>États-Unis</country>
<placeName>
<settlement type="city">Las Vegas</settlement>
<region type="state">Nevada</region>
</placeName>
<wicri:orgArea>School of Computer Science, University of Nevada</wicri:orgArea>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="3">
<country>États-Unis</country>
<placeName>
<settlement type="city">Las Vegas</settlement>
<region type="state">Nevada</region>
</placeName>
<wicri:orgArea>School of Computer Science, University of Nevada</wicri:orgArea>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">9FB8946D596A1C6C39D5A5684E2FF26F722CD51B</idno>
<idno type="DOI">10.1007/978-3-642-22688-5_28</idno>
<idno type="ChapterID">28</idno>
<idno type="ChapterID">Chap28</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Many applications of Formal Concept Analysis (FCA) start with a set of structured data such as objects and their properties. In practice, most of the data which is readily available are in the form of unstructured or semistructured text. A typical application of FCA assumes the extraction of objects and their properties by some other methods or techniques. For example, in the 2003 Los Alamos National Lab (LANL) project on Advanced Knowledge Integration In Assessing Terrorist Threats, a data extraction tool was used to mine the text for the structured data. In this paper, we provide a detailed description of our approach to extraction of personal names for possible subsequent use inFCA. Our basic approach is to integrate statistics on names and other words into an adaptation of a Hidden Markov Model (HMM). We use lists of names and their relative frequencies compiled from U.S. Census data. We also use a list of non-name words along with their frequencies in a training set from our collection of documents. These lists are compiled into one master list to be used as a part of the design.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Nevada</li>
</region>
<settlement>
<li>Las Vegas</li>
</settlement>
</list>
<tree>
<country name="États-Unis">
<region name="Nevada">
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</region>
<name sortKey="Beckley, Russell" sort="Beckley, Russell" uniqKey="Beckley R" first="Russell" last="Beckley">Russell Beckley</name>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000386 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000386 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:9FB8946D596A1C6C39D5A5684E2FF26F722CD51B
   |texte=   Name Extraction and Formal Concept Analysis
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024